23 research outputs found
Model Inversion Attack via Dynamic Memory Learning
Model Inversion (MI) attacks aim to recover the private training data from
the target model, which has raised security concerns about the deployment of
DNNs in practice. Recent advances in generative adversarial models have
rendered them particularly effective in MI attacks, primarily due to their
ability to generate high-fidelity and perceptually realistic images that
closely resemble the target data. In this work, we propose a novel Dynamic
Memory Model Inversion Attack (DMMIA) to leverage historically learned
knowledge, which interacts with samples (during the training) to induce diverse
generations. DMMIA constructs two types of prototypes to inject the information
about historically learned knowledge: Intra-class Multicentric Representation
(IMR) representing target-related concepts by multiple learnable prototypes,
and Inter-class Discriminative Representation (IDR) characterizing the
memorized samples as learned prototypes to capture more privacy-related
information. As a result, our DMMIA has a more informative representation,
which brings more diverse and discriminative generated results. Experiments on
multiple benchmarks show that DMMIA performs better than state-of-the-art MI
attack methods
Large Language Models are Versatile Decomposers: Decompose Evidence and Questions for Table-based Reasoning
Table-based reasoning has shown remarkable progress in combining deep models
with discrete reasoning, which requires reasoning over both free-form natural
language (NL) questions and structured tabular data. However, previous
table-based reasoning solutions usually suffer from significant performance
degradation on huge evidence (tables). In addition, most existing methods
struggle to reason over complex questions since the required information is
scattered in different places. To alleviate the above challenges, we exploit
large language models (LLMs) as decomposers for effective table-based
reasoning, which (i) decompose huge evidence (a huge table) into sub-evidence
(a small table) to mitigate the interference of useless information for table
reasoning; and (ii) decompose complex questions into simpler sub-questions for
text reasoning. Specifically, we first use the LLMs to break down the evidence
(tables) involved in the current question, retaining the relevant evidence and
excluding the remaining irrelevant evidence from the huge table. In addition,
we propose a "parsing-execution-filling" strategy to alleviate the
hallucination dilemma of the chain of thought by decoupling logic and numerical
computation in each step. Extensive experiments show that our method can
effectively leverage decomposed evidence and questions and outperforms the
strong baselines on TabFact, WikiTableQuestion, and FetaQA datasets. Notably,
our model outperforms human performance for the first time on the TabFact
dataset.Comment: SIGIR 202
Iterative Forward Tuning Boosts In-context Learning in Language Models
Large language models (LLMs) have exhibited an emergent in-context learning
(ICL) ability. However, the ICL models that can solve ordinary cases are hardly
extended to solve more complex tasks by processing the demonstration examples
once. This single-turn ICL is incoordinate with the decision making process of
humans by learning from analogy. In this paper, we propose an effective and
efficient two-stage framework to boost ICL in LLMs by exploiting a dual form
between Transformer attention and gradient descent-based optimization.
Concretely, we divide the ICL process into "Deep-Thinking" and inference
stages. The "Deep-Thinking" stage performs iterative forward optimization of
demonstrations, which is expected to boost the reasoning abilities of LLMs at
test time by "thinking" demonstrations multiple times. It produces accumulated
meta-gradients by manipulating the Key-Value matrices in the self-attention
modules of the Transformer. Then, the inference stage only takes the test query
as input without concatenating demonstrations and applies the learned
meta-gradients through attention for output prediction. In this way,
demonstrations are not required during the inference stage since they are
already learned and stored in the definitive meta-gradients. LLMs can be
effectively and efficiently adapted to downstream tasks. Extensive experiments
on ten classification and multiple-choice datasets show that our method
achieves substantially better performance than standard ICL in terms of both
accuracy and efficiency.Comment: 14 pages, 5 figure
TransAudio: Towards the Transferable Adversarial Audio Attack via Learning Contextualized Perturbations
In a transfer-based attack against Automatic Speech Recognition (ASR)
systems, attacks are unable to access the architecture and parameters of the
target model. Existing attack methods are mostly investigated in voice
assistant scenarios with restricted voice commands, prohibiting their
applicability to more general ASR related applications. To tackle this
challenge, we propose a novel contextualized attack with deletion, insertion,
and substitution adversarial behaviors, namely TransAudio, which achieves
arbitrary word-level attacks based on the proposed two-stage framework. To
strengthen the attack transferability, we further introduce an audio
score-matching optimization strategy to regularize the training process, which
mitigates adversarial example over-fitting to the surrogate model. Extensive
experiments and analysis demonstrate the effectiveness of TransAudio against
open-source ASR models and commercial APIs
Multimodal Recommendation Dialog with Subjective Preference: A New Challenge and Benchmark
Existing multimodal task-oriented dialog data fails to demonstrate the
diverse expressions of user subjective preferences and recommendation acts in
the real-life shopping scenario. This paper introduces a new dataset SURE
(Multimodal Recommendation Dialog with SUbjective PREference), which contains
12K shopping dialogs in complex store scenes. The data is built in two phases
with human annotations to ensure quality and diversity. SURE is well-annotated
with subjective preferences and recommendation acts proposed by sales experts.
A comprehensive analysis is given to reveal the distinguishing features of
SURE. Three benchmark tasks are then proposed on the data to evaluate the
capability of multimodal recommendation agents. Based on the SURE, we propose a
baseline model, powered by a state-of-the-art multimodal model, for these
tasks.Comment: ACL 202
PaCE: Unified Multi-modal Dialogue Pre-training with Progressive and Compositional Experts
Perceiving multi-modal information and fulfilling dialogues with humans is a
long-term goal of artificial intelligence. Pre-training is commonly regarded as
an effective approach for multi-modal dialogue. However, due to the limited
availability of multi-modal dialogue data, there is still scarce research on
multi-modal dialogue pre-training. Yet another intriguing challenge emerges
from the encompassing nature of multi-modal dialogue, which involves various
modalities and tasks. Moreover, new forms of tasks may arise at unpredictable
points in the future. Hence, it is essential for designed multi-modal dialogue
models to possess sufficient flexibility to adapt to such scenarios. This paper
proposes \textbf{PaCE}, a unified, structured, compositional multi-modal
dialogue pre-training framework. It utilizes a combination of several
fundamental experts to accommodate multiple dialogue-related tasks and can be
pre-trained using limited dialogue and extensive non-dialogue multi-modal data.
Furthermore, we propose a progressive training method where old experts from
the past can assist new experts, facilitating the expansion of their
capabilities. Experimental results demonstrate that PaCE achieves
state-of-the-art results on eight multi-modal dialog benchmarks.Comment: ACL 202
An Investigation of LLMs' Inefficacy in Understanding Converse Relations
Large Language Models (LLMs) have achieved remarkable success in many formal
language oriented tasks, such as structural data-to-text and semantic parsing.
However current benchmarks mostly follow the data distribution of the
pre-training data of LLMs. Therefore, a natural question rises that do LLMs
really understand the structured semantics of formal languages. In this paper,
we investigate this problem on a special case, converse binary relation. We
introduce a new benchmark ConvRe focusing on converse relations, which contains
17 relations and 1240 triples extracted from popular knowledge graph completion
datasets. Our ConvRE features two tasks, Re2Text and Text2Re, which are
formulated as multi-choice question answering to evaluate LLMs' ability to
determine the matching between relations and associated text. For the
evaluation protocol, apart from different prompting methods, we further
introduce variants to the test text and few-shot example text. We conduct
experiments on three popular LLM families and have observed various scaling
trends. The results suggest that LLMs often resort to shortcut learning and
still face challenges on our proposed benchmark.Comment: Accepted by EMNLP 202
VDialogUE: A Unified Evaluation Benchmark for Visually-grounded Dialogue
Visually-grounded dialog systems, which integrate multiple modes of
communication such as text and visual inputs, have become an increasingly
popular area of investigation. However, the absence of a standardized
evaluation framework poses a challenge in assessing the development of this
field. To this end, we propose \textbf{VDialogUE}, a \textbf{V}isually-grounded
\textbf{Dialog}ue benchmark for \textbf{U}nified \textbf{E}valuation. It
defines five core multi-modal dialogue tasks and covers six datasets.
Furthermore, in order to provide a comprehensive assessment of the model's
performance across all tasks, we developed a novel evaluation metric called
VDscore, which is based on the Analytic Hierarchy Process~(AHP) method.
Additionally, we present a straightforward yet efficient baseline model, named
\textbf{VISIT}~(\textbf{VIS}ually-grounded d\textbf{I}alog
\textbf{T}ransformer), to promote the advancement of general multi-modal
dialogue systems. It progressively builds its multi-modal foundation and
dialogue capability via a two-stage pre-training strategy.
We believe that the VDialogUE benchmark, along with the evaluation scripts
and our baseline models, will accelerate the development of visually-grounded
dialog systems and lead to the development of more sophisticated and effective
pre-trained models